共计 21758 个字符,预计需要花费 55 分钟才能阅读完成。
前言
k8s的worker节点在生产环境中发生了内存泄露,原因是kmem与3.10版本的内核会产生不兼容。
详细问题可参考 cloud.tencent.com/develo…
直接升级内核是最简单的处理手段。
常用的修改Linux操作系统的内核方式有三种:
- 源码编译内核
- 从官方下载已经编译好的包安装内核
- 包管理器安装内核
本文以企业内常用发行版 CentOS 7 和 Ubuntu 20.04为例,介绍如何使用包管理器进行内核更新的方法。
CentOS 7
创建一台全新服务器。
# 查看内核版本
[root@ecs-images ~]# uname -a
Linux ecs-images 3.10.0-1160.92.1.el7.x86_64 #1 SMP Tue Jun 20 11:48:01 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
yum
仓库配置
elrepo
:一个为RHEL系提供额外软件包的项目。由社区维护,发行的包经过测试和验证。可以保障稳定性和兼容性。
导入elrepo
源
[root@ecs-images ~]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
[root@ecs-images ~]# yum install https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
导入源后更新包缓存
此处非必选,当执行后续动作时,yum也会自动生成新缓存。
[root@ecs-images ~]# yum makecache
重点提醒:
在导入新源后,不要轻易执行yum update -y
。运维的任何操作都需要保证确定性
。
很多人会用yum update -y
来生成新的包缓存,但这是一个错误的做法。因为它会更新操作系统内所有可更新的软件包。尤其在生产环境上,不必要的更新操作容易引起预期之外的风险。
列出可用内核版本
内核的分类:
-
kernel-ml
:ml是英文【 mainline stable 】的缩写,即最新的稳定主线版本。 -
kernel-lt
:lt是英文【 long term support 】的缩写,elrepo-kernel中罗列出来的长期支持版本。
[root@ecs-images ~]# yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* elrepo-kernel: mirrors.thzhost.com
elrepo-kernel | 3.0 kB 00:00:00
elrepo-kernel/primary_db | 2.1 MB 00:00:05
Available Packages
kernel-lt.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
kernel-lt-devel.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
kernel-lt-doc.noarch 5.4.251-1.el7.elrepo elrepo-kernel
kernel-lt-headers.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
kernel-lt-tools.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
kernel-lt-tools-libs.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
kernel-lt-tools-libs-devel.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
kernel-ml.x86_64 6.4.7-1.el7.elrepo elrepo-kernel
kernel-ml-devel.x86_64 6.4.7-1.el7.elrepo elrepo-kernel
kernel-ml-doc.noarch 6.4.7-1.el7.elrepo elrepo-kernel
kernel-ml-headers.x86_64 6.4.7-1.el7.elrepo elrepo-kernel
kernel-ml-tools.x86_64 6.4.7-1.el7.elrepo elrepo-kernel
kernel-ml-tools-libs.x86_64 6.4.7-1.el7.elrepo elrepo-kernel
kernel-ml-tools-libs-devel.x86_64 6.4.7-1.el7.elrepo elrepo-kernel
perf.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
python-perf.x86_64 5.4.251-1.el7.elrepo elrepo-kernel
更新内核
安装内核
选择指定内核版本安装,或直接安装最新版本。建议使用kernel-lt
,通常它相比kernel-ml
具备更好的兼容性。
# 安装指定lt版本
[root@ecs-images ~]# yum --enablerepo=elrepo-kernel install kernel-lt-devel-5.4.251-1.el7.elrepo.x86_64 kernel-lt-5.4.251-1.el7.elrepo.x86_64 -y
# 安装最新lt版本
[root@ecs-images ~]# yum --enablerepo=elrepo-kernel install kernel-lt-devel kernel-lt -y
查看系统已存在的内核
[root@ecs-images ~]# awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg
CentOS Linux (5.4.251-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.92.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.el7.x86_64) 7 (Core)
CentOS Linux (0-rescue-57beda17722b499da37e22c55c2ef57f) 7 (Core)
可以发现,新安装的内核在/etc/grub2.cfg
位于第一位。
查看最高优先级引导的内核
[root@ecs-images ~]# grub2-editenv list
saved_entry=CentOS Linux (3.10.0-1160.92.1.el7.x86_64) 7 (Core)
设置内核启动顺序
[root@ecs-images ~]# grub2-set-default 0
再次查看最高优先级引导的内核
[root@ecs-images ~]# grub2-editenv list
saved_entry=0
可以发现,saved_entry
的值由之前的具体内核版本CentOS Linux (3.10.0-1160.92.1.el7.x86_64) 7 (Core)
变为了索引值0
。
重启验证
[root@ecs-images ~]# reboot
Connection to 172.27.3.74 closed by remote host.
Connection to 172.27.3.74 closed.
@ops-2701 /home/pengyinwei$ ssh root@172.27.3.74 "sudo sh -c 'uname -a '"
root@172.27.3.74's password:
Linux ecs-images 5.4.251-1.el7.elrepo.x86_64 #1 SMP Thu Jul 27 18:49:53 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
ansible playbook
使用ansible playbook编排,可以将流程沉淀并方便后续再次执行。
playbook示例
# 查看目录结构
@ops-2701 /ops/scripts/os_init/test$ ls
hosts update-kernel.yml
# hosts文件,配置需要执行的hosts.由于是全新机器,此处采用账号密码方式进行ssh登录
@ops-2701 /ops/scripts/os_init/test$ cat hosts
[test]
test-images ansible_host=172.27.3.74 ansible_user="root" ansible_ssh_pass="thisisafakepassword"
# ansible剧本
@ops-2701 /ops/scripts/os_init/test$ cat update-kernel.yml
- name: Upgrade Kernel
hosts: test
gather_facts: false
become: yes
tasks:
- name: Kernel | 导入key
shell: rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
args:
warn: false
- name: Kernel | 导入仓库
shell: yum install https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm -y
args:
warn: false
- name: Kernel | 更新内核
yum:
name: kernel-lt-devel,kernel-lt
enablerepo: elrepo-kernel
state: latest
- name: Kernel | 设置内核启动顺序
shell: grub2-set-default 0
args:
warn: false
- name: Kernel | 记录更新的内核版本
shell: awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg|head -n 1
register: kernel_version_1
- name: Kernel | 确认是否重启
pause:
prompt: "安装的内核版本是{{ kernel_version_1['stdout_lines'] }}.确认是否重启,请输入(yes)以确定... "
register: my_pause_1
delegate_to: localhost
- name: Kernel | 确认用户输入
debug:
msg: "未输入yes,不进行重启"
when: my_pause_1.user_input != "yes"
- name: Kernel | 重启
reboot:
msg: "等待重启完成..."
test_command: uname -r
when: my_pause_1.user_input == "yes"
- name: Kernel | 记录当前内核版本
shell: uname -r
register: kernel_version_2
when: my_pause_1.user_input == "yes"
- name: Kernel | 打印当前内核版本
debug:
msg: "{{ kernel_version_2['stdout_lines'] }}"
when: my_pause_1.user_input == "yes"
playbook中,有高危操作reboot
。使用pause
,由用户来决策仅当输入yes才执行reboot而不是直接执行。
执行结果
ansible @ops-2701 /ops/scripts/os_init/test$ ansible-playbook -i hosts update-kernel.yml
PLAY [Upgrade Kernel] **********************************************************************************************
TASK [Kernel | 导入key] **********************************************************************************************
Wednesday 02 August 2023 18:33:28 +0800 (0:00:00.026) 0:00:00.026 ******
changed: [test-images]
TASK [Kernel | 导入仓库] ***********************************************************************************************
Wednesday 02 August 2023 18:33:29 +0800 (0:00:01.223) 0:00:01.249 ******
changed: [test-images]
TASK [Kernel | 更新内核] **********************************记录更新的内核版本*************************************************************
Wednesday 02 August 2023 18:33:31 +0800 (0:00:02.268) 0:00:03.517 ******
ok: [test-images]
TASK [Kernel | 设置内核启动顺序] *******************************************************************************************
Wednesday 02 August 2023 18:33:35 +0800 (0:00:03.628) 0:00:07.146 ******
changed: [test-images]
TASK [Kernel | 记录更新的内核版本] ******************************************************************************************
Wednesday 02 August 2023 18:33:35 +0800 (0:00:00.223) 0:00:07.370 ******
changed: [test-images]
TASK [Kernel | 确认是否重启] *********************************************************************************************
Wednesday 02 August 2023 18:33:35 +0800 (0:00:00.200) 0:00:07.571 ******
[Kernel | 确认是否重启]
安装的内核版本是['CentOS Linux (5.4.251-1.el7.elrepo.x86_64) 7 (Core)'].确认是否重启,请输入(yes)以确定... :
ok: [test-images]
TASK [Kernel | 确认用户输入] *********************************************************************************************
Wednesday 02 August 2023 18:33:39 +0800 (0:00:03.364) 0:00:10.936 ******
TASK [Kernel | 重启] *************************************************************************************************
Wednesday 02 August 2023 18:33:39 +0800 (0:00:00.060) 0:00:10.996 ******
changed: [test-images]
TASK [Kernel | 记录当前内核版本] *******************************************************************************************
Wednesday 02 August 2023 18:34:01 +0800 (0:00:22.328) 0:00:33.325 ******
changed: [test-images]
TASK [Kernel | 打印当前内核版本] *******************************************************************************************
Wednesday 02 August 2023 18:34:01 +0800 (0:00:00.360) 0:00:33.685 ******
ok: [test-images] => {
"msg": [
"5.4.251-1.el7.elrepo.x86_64"
]
}
PLAY RECAP *********************************************************************************************************
test-images : ok=9 changed=6 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
Wednesday 02 August 2023 18:34:02 +0800 (0:00:00.071) 0:00:33.756 ******
===============================================================================
Kernel | 重启 ------------------------------------------------------------------------------------------------ 22.33s
Kernel | 更新内核 ----------------------------------------------------------------------------------------------- 3.63s
Kernel | 确认是否重启 --------------------------------------------------------------------------------------------- 3.36s
Kernel | 导入仓库 ----------------------------------------------------------------------------------------------- 2.27s
Kernel | 导入key ---------------------------------------------------------------------------------------------- 1.22s
Kernel | 记录当前内核版本 ------------------------------------------------------------------------------------------- 0.36s
Kernel | 设置内核启动顺序 ------------------------------------------------------------------------------------------- 0.22s
Kernel | 记录更新的内核版本 ------------------------------------------------------------------------------------------ 0.20s
Kernel | 打印当前内核版本 ------------------------------------------------------------------------------------------- 0.07s
Kernel | 确认用户输入 --------------------------------------------------------------------------------------------- 0.06s
Ubuntu 20.04
这里我们也创建了一台全新的服务器,查看当前的内核版本。
root@ecs-images-ubuntu:~# uname -a
Linux ecs-images-ubuntu 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
apt
仓库配置
Ubuntu默认的源即可进行内核更新,无需像CentOS导入外部源。
更新源缓存
root@ecs-images-ubuntu:~# apt update
有趣的点来了。
在 CentOS 下,yum update
和yum upgrade
都会执行直接更新可用的软件包。更新源缓存是使用yum makecache
。
而在 Ubuntu 中,apt update
用于更新源缓存,apt upgrade
则是用于更新软件包。
列出可用内核版本
内核的分类:
-
linux-image-generic
:这是 Ubuntu 20.04 默认安装的内核版本,也被称为 GA (General Availability) 内核。它是在 Ubuntu 20.04 发布时提供的稳定内核版本。 -
linux-image-generic-hwe-20.04
:这是 Ubuntu 20.04 的 HWE (Hardware Enablement) 内核版本,用于提供对较新硬件的支持。它是在 Ubuntu 20.04.2 发布时引入的。 -
linux-image-lowlatency
:这是针对需要低延迟的应用场景(如音频/视频处理)而设计的内核版本。它提供了与linux-image-generic
相同的功能,但优化了内核调度以减少延迟。
root@ecs-images-ubuntu:~# apt show linux-image-generic-hwe-20.04 -a
Package: linux-image-generic-hwe-20.04
Version: 5.15.0.78.85~20.04.38
Priority: optional
Section: kernel
Source: linux-meta-hwe-5.15
Origin: Ubuntu
Maintainer: Ubuntu Kernel Team <kernel-team@lists.ubuntu.com>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 20.5 kB
Provides: spl-modules (= 2.1.5-1ubuntu6~22.04.1), v4l2loopback-modules (= 0.12.7-2ubuntu2~22.04.1), virtualbox-guest-modules (= 5.15.0-78), wireguard-modules (= 1.0.0), zfs-modules (= 2.1.5-1ubuntu6~22.04.1)
Depends: linux-image-5.15.0-78-generic, linux-modules-extra-5.15.0-78-generic, linux-firmware, intel-microcode, amd64-microcode
Recommends: thermald
Download-Size: 2,720 B
APT-Manual-Installed: no
APT-Sources: http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 Packages
Description: Generic Linux kernel image
This package will always depend on the latest generic kernel image
available.
Package: linux-image-generic-hwe-20.04
Version: 5.4.0.26.32
Priority: optional
Section: kernel
Source: linux-meta
Origin: Ubuntu
Maintainer: Ubuntu Kernel Team <kernel-team@lists.ubuntu.com>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 17.4 kB
Provides: virtualbox-guest-modules (= 6.1.6-dfsg-1), wireguard-modules (= 1.0.20200413-1), zfs-modules (= 0.8.3-1ubuntu12)
Depends: linux-image-5.4.0-26-generic, linux-modules-extra-5.4.0-26-generic, linux-firmware, intel-microcode, amd64-microcode
Recommends: thermald
Download-Size: 2,832 B
APT-Sources: http://repo.huaweicloud.com/ubuntu focal/main amd64 Packages
Description: Generic Linux kernel image
This package will always depend on the latest generic kernel image
available.
检索Version查阅可用的版本。
可以看到,Ubuntu和CentOS理念不同,它并不会给太多历史版本供选择
更新内核
内核安装
这里以安装最新linux-image-generic-hwe-20.04
为示例
root@ecs-images-ubuntu:~# apt install linux-generic-hwe-20.04 -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libdbus-glib-1-2 libevdev2 libimobiledevice6 libplist3 libupower-glib3 libusbmuxd6
linux-headers-5.15.0-78-generic linux-headers-generic-hwe-20.04 linux-hwe-5.15-headers-5.15.0-78
linux-image-5.15.0-78-generic linux-image-generic-hwe-20.04 linux-modules-5.15.0-78-generic
linux-modules-extra-5.15.0-78-generic thermald upower usbmuxd
Suggested packages:
libusbmuxd-tools fdutils linux-doc | linux-hwe-5.15-source-5.15.0 linux-hwe-5.15-tools
The following NEW packages will be installed:
libdbus-glib-1-2 libevdev2 libimobiledevice6 libplist3 libupower-glib3 libusbmuxd6 linux-generic-hwe-20.04
linux-headers-5.15.0-78-generic linux-headers-generic-hwe-20.04 linux-hwe-5.15-headers-5.15.0-78
linux-image-5.15.0-78-generic linux-image-generic-hwe-20.04 linux-modules-5.15.0-78-generic
linux-modules-extra-5.15.0-78-generic thermald upower usbmuxd
0 upgraded, 17 newly installed, 0 to remove and 25 not upgraded.
...
有趣的点再次来了,执行的是apt install linux-generic-hwe-20.04 -y
,但从输出来看,安装的却是linux-headers-5.15.0-78-generic
。让我们来问问万能的GPT
linux-generic-hwe-20.04 包实际上是一个依赖包,它会安装最新的 HWE(硬件支持)内核和相关的依赖项。
在您的命令输出中,除了安装了 linux-headers-5.15.0-78-generic 外,还安装了其他与内核相关的软件包,例如 linux-image-5.15.0-78-generic、linux-modules-5.15.0-78-generic 等。
这些软件包一起提供了完整的内核功能。linux-headers-5.15.0-78-generic 是内核的头文件,用于编译内核模块。linux-image-5.15.0-78-generic 是实际的内核映像文件,用于启动系统。
解释的很清楚,点个赞。
查看系统已存在的内核
root@ecs-images-ubuntu:~# grep menuentry /boot/grub/grub.cfg | grep with |cut -d "'" -f 2
Ubuntu, with Linux 5.15.0-78-generic
Ubuntu, with Linux 5.15.0-78-generic (recovery mode)
Ubuntu, with Linux 5.4.0-153-generic
Ubuntu, with Linux 5.4.0-153-generic (recovery mode)
Ubuntu, with Linux 5.4.0-26-generic
Ubuntu, with Linux 5.4.0-26-generic (recovery mode)
查看最高优先级引导的内核
ubuntu默认会以序列第一的内核作为最高优先启动项
设置内核启动顺序
编辑/etc/default/grub
配置文件,修改GRUB_DEFAULT
的值
root@ecs-images-ubuntu:~# cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=menu
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="net.ifnames=0 consoleblank=600 console=tty0 console=ttyS0,115200n8 nospectre_v2 nopti noibrs noibpb"
可以看到当前配置的值是0
,即使用索引方式,排序第一的内核作为最优先启动。
我们有两种方式可以进行更改
- 按菜单项的编号,即索引
- 按菜单项的标题,即
Ubuntu, with Linux 5.15.0-78-generic
(增添内容,与上文环境并不一致)
索引方式需要注意:
在 CentOS中,GRUB 菜单是扁平菜单结构
而在 Ubuntu 中,GRUB 菜单是有主菜单和子菜单导航语法的,例如:
root@aws-mx-ai-kas-gpu-l40s-02:~# grep menu /boot/grub/grub.cfg
if [ x"${feature_menuentry_id}" = xy ]; then
menuentry_id_option="--id"
menuentry_id_option=""
export menuentry_id_option
set menu_color_normal=white/black
set menu_color_highlight=black/light-gray
menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-4b727438-7c0b-4757-a56f-24bd780b3527' {
submenu 'Advanced options for Ubuntu' $menuentry_id_option 'gnulinux-advanced-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1072-aws' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-1072-aws-advanced-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1072-aws (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-1072-aws-recovery-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1033-aws' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-1033-aws-advanced-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1033-aws (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-1033-aws-recovery-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-97-generic' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-97-generic-advanced-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-97-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-97-generic-recovery-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu 20.04.6 LTS (20.04) (on /dev/nvme0n1p1)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-simple-4b727438-7c0b-4757-a56f-24bd780b3527' {
submenu 'Advanced options for Ubuntu 20.04.6 LTS (20.04) (on /dev/nvme0n1p1)' $menuentry_id_option 'osprober-gnulinux-advanced-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu (on /dev/nvme0n1p1)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/boot/vmlinuz-5.15.0-1072-aws--4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1072-aws (on /dev/nvme0n1p1)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/boot/vmlinuz-5.15.0-1072-aws--4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1072-aws (recovery mode) (on /dev/nvme0n1p1)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/boot/vmlinuz-5.15.0-1072-aws-root=PARTUUID=58867aa0-680a-4387-ad0b-6402c0255536 ro recovery nomodeset dis_ucode_ldr panic=-1-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1033-aws (on /dev/nvme0n1p1)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/boot/vmlinuz-5.15.0-1033-aws--4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-1033-aws (recovery mode) (on /dev/nvme0n1p1)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/boot/vmlinuz-5.15.0-1033-aws-root=PARTUUID=58867aa0-680a-4387-ad0b-6402c0255536 ro recovery nomodeset dis_ucode_ldr panic=-1-4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-97-generic (on /dev/nvme0n1p1)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/boot/vmlinuz-5.15.0-97-generic--4b727438-7c0b-4757-a56f-24bd780b3527' {
menuentry 'Ubuntu, with Linux 5.15.0-97-generic (recovery mode) (on /dev/nvme0n1p1)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/boot/vmlinuz-5.15.0-97-generic-root=PARTUUID=58867aa0-680a-4387-ad0b-6402c0255536 ro recovery nomodeset dis_ucode_ldr panic=-1-4b727438-7c0b-4757-a56f-24bd780b3527' {
set timeout_style=menu
# This file provides an easy way to add custom menu entries. Simply type the
# menu entries you want to add after this comment. Be careful not to change
菜单结构为:
Ubuntu
Advanced options for Ubuntu (1) <-- 这是主菜单的第1项
└── Ubuntu, with Linux 5.15.0-1072-aws
└── Ubuntu, with Linux 5.15.0-1072-aws (recovery mode)
└── Ubuntu, with Linux 5.15.0-1033-aws (2) <-- 这是子菜单的第2项
└── Ubuntu, with Linux 5.15.0-1033-aws (recovery mode)
...
如果期望选择 Ubuntu, with Linux 5.15.0-1033-aws
,需配置为:GRUB_DEFAULT="1 >2"
。
这表示选择第二个主菜单中的第三个子菜单栏。
当然我们也可以直接使用名称方式,配置为:GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.15.0-1033-aws"
配置完毕后,执行:
update-grub
重启验证
root@ecs-images-ubuntu:~# uname -a
Linux ecs-images-ubuntu 5.15.0-78-generic #85~20.04.1-Ubuntu SMP Mon Jul 17 09:42:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
ansible playbook
yml示例
- name: Upgrade Kernel
hosts: test
gather_facts: false
become: yes
tasks:
- name: Kernel | 更新缓存
apt:
update_cache: yes
- name: Kernel | 更新内核
apt:
name: linux-generic-hwe-20.04
state: latest
- name: Kernel | 记录更新的内核版本
shell: grep menuentry /boot/grub/grub.cfg | grep with | head -n 1 | cut -d "'" -f 2
register: kernel_version_1
- name: Kernel | 确认是否重启
pause:
prompt: "安装的内核版本是{{ kernel_version_1['stdout_lines'] }}.确认是否重启,请输入(yes)以确定... "
register: my_pause_1
delegate_to: localhost
- name: Kernel | 确认用户输入
debug:
msg: "未输入yes,不进行重启"
when: my_pause_1.user_input != "yes"
- name: Kernel | 重启
reboot:
msg: "等待重启完成..."
test_command: uname -r
when: my_pause_1.user_input == "yes"
- name: Kernel | 记录当前内核版本
shell: uname -r
register: kernel_version_2
when: my_pause_1.user_input == "yes"
- name: Kernel | 打印当前内核版本
debug:
msg: "{{ kernel_version_2['stdout_lines'] }}"
when: my_pause_1.user_input == "yes"
执行结果
ansible @ops-2701 /ops/scripts/os_init/test$ ansible-playbook -i hosts update-kernel.yml
PLAY [Upgrade Kernel] **********************************************************************************************
TASK [Kernel | 更新缓存] ***********************************************************************************************
Wednesday 02 August 2023 19:41:24 +0800 (0:00:00.025) 0:00:00.025 ******
[WARNING]: Updating cache and auto-installing missing dependency: python-apt
changed: [test-images]
TASK [Kernel | 更新内核] ***********************************************************************************************
Wednesday 02 August 2023 19:41:31 +0800 (0:00:07.729) 0:00:07.754 ******
ok: [test-images]
TASK [Kernel | 记录更新的内核版本] ******************************************************************************************
Wednesday 02 August 2023 19:41:32 +0800 (0:00:00.994) 0:00:08.749 ******
changed: [test-images]
TASK [Kernel | 确认是否重启] *********************************************************************************************
Wednesday 02 August 2023 19:41:33 +0800 (0:00:00.284) 0:00:09.033 ******
[Kernel | 确认是否重启]
安装的内核版本是['Ubuntu, with Linux 5.15.0-78-generic'].确认是否重启,请输入(yes)以确定... :
ok: [test-images]
TASK [Kernel | 确认用户输入] *********************************************************************************************
Wednesday 02 August 2023 19:41:41 +0800 (0:00:08.895) 0:00:17.929 ******
TASK [Kernel | 重启] *************************************************************************************************
Wednesday 02 August 2023 19:41:41 +0800 (0:00:00.072) 0:00:18.001 ******
changed: [test-images]
TASK [Kernel | 记录当前内核版本] *******************************************************************************************
Wednesday 02 August 2023 19:42:25 +0800 (0:00:43.139) 0:01:01.141 ******
changed: [test-images]
TASK [Kernel | 打印当前内核版本] *******************************************************************************************
Wednesday 02 August 2023 19:42:25 +0800 (0:00:00.233) 0:01:01.374 ******
ok: [test-images] => {
"msg": [
"5.15.0-78-generic"
]
}
PLAY RECAP *********************************************************************************************************
test-images : ok=7 changed=4 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
Wednesday 02 August 2023 19:42:25 +0800 (0:00:00.066) 0:01:01.441 ******
===============================================================================
Kernel | 重启 ------------------------------------------------------------------------------------------------ 43.14s
Kernel | 确认是否重启 --------------------------------------------------------------------------------------------- 8.90s
Kernel | 更新缓存 ----------------------------------------------------------------------------------------------- 7.73s
Kernel | 更新内核 ----------------------------------------------------------------------------------------------- 0.99s
Kernel | 记录更新的内核版本 ------------------------------------------------------------------------------------------ 0.28s
Kernel | 记录当前内核版本 ------------------------------------------------------------------------------------------- 0.23s
Kernel | 确认用户输入 --------------------------------------------------------------------------------------------- 0.07s
Kernel | 打印当前内核版本 ------------------------------------------------------------------------------------------- 0.07s
总结
不同的发行版风格迥异,在使用中不能照猫画虎,否则可能引发灾难。细致的探究它们的差别是一件有趣的事情。
运维需要心存敬畏,对生产环境执行的每一项操作,力保能有确定性
的结果。
如果没有特别的需求,且操作系统能够正常联通公网,选择包管理器安装是简便有效的方式。
会多次执行的运维过程应该尽力避免人工操作,常用的命令 脚本 & 剧本(ansible)化
,进一步迭代可以 服务化
。技能的沉淀不以人为主体,而是做到每一位继任者上手即可用。